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Abstract 

Like a scientist or a playing child, PowerPlay 1241 not only learns new skills to solve given prob- 
lems, but also invents new interesting problems by itself. By design, it continually comes up with the 
fastest to find, initially novel, but eventually solvable tasks. It also continually simplifies or compresses 
or speeds up solutions to previous tasks. Here we describe first experiments with PowerPlay. A self- 
delimiting recurrent neural network SLIM RNN 1251 is used as a general computational problem solving 
architecture. Its connection weights can encode arbitrary, self-delimiting, halting or non-halting pro- 
grams affecting both environment (through effectors) and internal states encoding abstractions of event 
sequences. Our PoWERPLAY-driven SLIM RNN learns to become an increasingly general solver of self- 
invented problems, continually adding new problem solving procedures to its growing skill repertoire. 
Extending a recent conference paper 1 28 1, we identify interesting, emerging, developmental stages of our 
open-ended system. We also show how it automatically self-modularizes, frequently re-using code for 
previously invented skills, always trying to invent novel tasks that can be quickly validated because they 
do not require too many weight changes affecting too many previous tasks. 

1 Introduction 

To automatically construct an increasingly general problem solver, the recent PowerPlay framework 
|p4l incrementally and efficiently searches the space of possible pairs of (1) new task descriptions (from 
the set of all computable task descriptions), and (2) modifications of the current problem solver. The search 
continues until the first pair is discovered for which (i) the current solver cannot solve the new task, and 
(ii) the modified solver provably solves all previously learned tasks plus the new one. Here a new task may 
actually be to simplify, compress, or speed up previous solutions, which in turn may invoke or partially 
re-use solutions to other tasks. The above process of discovering and solving a novel task can be repeated 
forever in open-ended fashion. 

As a concrete implementation of the solver, we use a special neural network (NN) |2j architecture 
called the Self-Delimiting NN or SLIM NN 1251. Given a SLIM NN that can already solve a finite known 
set of previously learned tasks, an asymptotically optimal program search algorithm ||9l |26l |20] |2T| can be 
used to find a new pair that provably has properties (i) and (ii). Once such a pair is found, the cycle repeats 
itself. This results in a continually growing set of tasks solvable by an increasingly more powerful solver. 
The resulting repertoire of self-invented problem-solving procedures or skills can be exploited at any time 
to solve externally posed tasks. 

The SLIM NN has modifiable components, namely, its connection weights. By keeping track of which 
tasks depend on each connection, PowerPlay can reduce the time required for testing previously solved 
tasks with certain newly modified connection weights, because only tasks that depend on the changed 
connections need to be retested. If the solution of the most recently invented task does not require changes 
of many weights, and if the changed connections do not affect many previous tasks, then validation may be 
very efficient. Since PowerPlay's efficient search process has a built-in bias towards tasks whose validity 
check requires little computational effort, there is an implicit incentive to generate weight modifications 
that do not impact too many previous tasks. This leads to a natural decomposition of the space of tasks 
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and their solutions into more or less independent regions. Thus, divide and conquer strategies are natural 
by-products of PowerPlay. 

Note that active learning methods jS] such as AdaBoost ||6l have a totally different set-up and purpose: 
there the user provides a set of samples to be learned, then each new classifier in a series of classifiers 
focuses on samples badly classified by previous classifiers. In open-ended PowerPlay, however, all 
computational tasks (not necessarily classification tasks) can be self-invented; there is no need for a pre- 
defined global set of tasks that each new solver tries to solve better, instead the task set continually grows 
based on which task is easy to invent and validate, given what is already known. 

Unlike our first implementations of curious / creative / playful agents from the 1990s lfT7ll29l[T8l (c/. lH] 
|4][T3][Ill), PowerPlay provably (by design) does not have any problems with online learning — it cannot 
forget previously learned skills, automatically segmenting its life into a sequence of clearly identified tasks 
with explicitly recorded solutions. Unlike the task search of theoretically optimal creative agents [22 , 23 1, 
POWERPlay's task search is greedy, yet practically feasible. Here we present first experiments, extending 
recent work f2S\ . 

2 Notation & Algorithmic Framework for PowerPlay (Variant II) 

We use the notation of the original paper (24), and briefly review the basics relevant here. B* denotes the 
set of finite bit strings over the binary alphabet B = {0, 1}, N the natural numbers, M the real numbers. The 
computational architecture of PowerPlay's problem solver may be a deterministic universal computer, 
or a more limited device such as a feedforward NN. All problem solvers can be uniquely encoded |7 1 or 
implemented on universal computers such as universal Turing Machines (TM) [311. Therefore, without loss 
of generality, we can assume a fixed universal reference computer whose inputs and outputs are elements 
of B* . User-defined subsets S,T C B* define the sets of possible problem solvers and task descriptions. 
For example, T may be the infinite set of all computable tasks, or a small subset thereof. V C B* defines 
a set of possible programs which may be used to generate or modify members of S or T. If our solver 
is a feedforward NN, then S could be a highly restricted subset of programs encoding the NN's possible 
topologies and weights, T could be encodings of input-output pairs for a supervised learning task, and V 
could be an algorithm that modifies the weights of the network. 

The problem solver's initial program is called sq. A particular sequence of unique task descriptions 
Ti,T2, . . . (where each T,; G T) is chosen or "invented" by a search method (see below) such that the 
solutions of Ti, ... ,Ti can be computed by s^, the i-th instance of the program, but Ti cannot be solved 
by Si_i. Each Ti consists of a unique problem identifier that can be read by Si through some built-in 
mechanism (e.g., input neurons of an NN as in Sec. [3]and|4|i, and a unique description of a deterministic 
procedure for deciding whether the problem has been solved. For example, a simple task may require 
the solver to answer a particular input pattern with a particular output pattern. Or it may require the 
solver to steer a robot towards a goal through a sequence of actions. Denote T<i = {Ti, . . . , Ti}; T^i — 
{Ti, . . . , Ti^i}. A valid task Ti (i > 1) may require solving at least one previously solved task T^ (k < i) 
more efficiently, by using less resources such as storage space, computation time, energy, etc. quantified 
by the function Cost{s,T). The algorithmic framework (Alg.[T]l incrementally trains the problem solver 
by finding p E V that increase the set of solvable tasks. For more details, the reader is encouraged to refer 
to the original report 12411 . 
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Algorithm 1 PowerPlay Framework (Variant II) 



Initialize so in some way 
for i := 1, 2, . . . do 

Declare new global variables Ti G T, s, £ S , Pi £ V , Ci, c* £ M. (all unassigned) 
repeat 

Let a search algorithm (e.g., Section[3]( set pi, a new candidate program. Give pi limited time to do: 

* Task Invention: Unless the user specifies Ti, let pi set Ti. 

* Solver Modification: Letp^ set Si by computing a modification of Si_i. 

* Correctness Demonstration: Letpi compute d -.= Cost{si,T<i) and c* := Cost{si-i,T<i) 
until c* — a > e (minimal savings of costs such as time/space/etc on all tasks so far) 

Freeze/store forever pi,Ti, Si, Ci, c* 
end for 



3 Experiment 1: Self-Invented Pattern Recognition Tasks 

We start with pattern classification tasks. In this setup, s encodes an arbitrary set of weights for a fixed- 
topology multi-layer perceptron (MLP). The MLP maps two-dimensional, real-valued input vectors from 
the unit square to binary labels; i.e., s: [0, 1) x [0, 1) — > 0, 1. The output label is or 1 depending on 
whether or not the real-valued activation of the MLP's single output neuron exceeds 0.5. Binary programs 
p E V of length length{p) compute tasks and modify s as follows. If (the first bit of p) is 0, this will 
specify that the current task is to simplify s by weight decay, under the assumption that smaller weights are 
simpler. Such programs implement compression tasks. But if p^ is 1, then the target label of the current 
task candidate T will be given by the next bit p^, and T's two-dimensional input vector will be uniquely 
encoded by the remainder of p's bit string, p^p'^ . . .p", as follows. The string p^p'^ . . .p" is taken as the 
binary representation of an integer N. Then a 2D Gaussian pseudo-random number generator is used to 
generate numbers {xi, yi), {x2,y2), ■ ■ ■, where x and y are used as 2D coordinates in the unit square. Now 
the task is to label the coordinates (xnjVn) as p2- 

The random number generator is re-seeded by the same seed every time a new task search begins, thus 
ensuring a deterministic search order. Since we only have two labels in this experiment, we do not need 
p^ as we can choose the target label to be different from the label eminently assigned by the MLP to the 
encoded input. To run p for t steps (on a training set of i patterns so far) means to execute [t/2ij epochs of 
gradient descent on the training set and check whether the patterns are correctly classified. Here one step 
always refers to the processing of a single pattern (either a forward or backward pass), regardless of the 
task. 

Assume now that PowerPlay has already learned a version of s called Si-i able to classify i — 1 
previously invented training patterns (i > 1). Then the next task is defined by a simple enumerative search 
in the style of universal search lfT0ll26ll2Tll . which combines task simplification and systematic run-time 
growth (see Alg.|2|i. 

Algoritlim 2 PowerPlay implementation for experiment 1 
Initialize sq in some way 
for j 1, 2, . . . do 
for m := 1, 2, . . . do 

for all candidate programs p s.t. length{p) < m do 
Run p for at most 2"" steps 

if p creates Si from Si_i correctly classifying all i training patterns so far and {Si is substantially simpler 
than Si_iOr Si can classify a newly found pattern misclassified by Si_i) then 
Set Pi ■.= p (store the candidate) 
exit m loop; 
end if 
end for 
end for 
end for 
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(d) After 25 tasks (e) After 33 tasks (f) After 43 tasks 

Figure 1 : Experiment 1 . Right after initialization, before the first compressions, the decision boundary may 
be arbitrary and possibly non-linear. The drive to compress and simplify, however, first encourages linear 
separability (top row). As more associations are invented, it becomes harder and harder to learn new ones 
that break the previous solver's generalization ability, while maintaining a linear boundary. Eventually 
this causes the decision boundary to become non-linear (bottom row). The decision boundary becomes 
increasingly non-linear, as more and more associations are invented and learned. 

Since the compression task code is the single bit '0', roughly half of the total search time is spent 
on simplification, the rest is spent on the invention of new training patterns that break the MLP's current 
generalization ability. 

To monitor the evolution of the solver's generalization map, after each successful search for a new task, 
the labels of grid points are plotted in a rather dense grid on the unit square (Fig.[T]), to see how the MLP 
maps [0, 1) X [0, 1) to 0, 1. As expected, the experiments show that in the beginning PowerPlay prefers 
to invent and learn simple linear functions. However, there is a phase transition to more complex non-linear 
functions after a few tasks, indicating a new developmental stage lfT4l[T9l[l2J . This is a natural by-product 
of the search for simple tasks — they are easier to invent and verify than more complex non-linear tasks. 
As learning proceeds, we observe that the decision boundary becomes increasingly non-linear, because the 
system has to come up with tasks which the solver cannot solve yet, but the solver becomes increasingly 
more powerful, so the system has to invent increasingly harder tasks. On the other hand, the search time 
for solutions to harder and harder tasks need not grow over time, since new solutions do not have to be 
learnt from scratch, but may re-use previous solutions encoded as parts of the previous solver 

4 Experiment 2: Self -Invented Tasks Involving Motor Control and 
Internal Abstractions 

4.1 Self-Delimiting (SLIM) Programs Run on A Recurrent Neural Network (RNN) 

Here we describe experiments with a PowerPlay -based RNN that continually invents novel sequences 
of actions affecting an external environment, over time becoming a more and more general solver of self- 
invented problems. 
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(c) i = 3 (d) t = 4 

Figure 2: SLIM RNN activation scheme. At various time steps, active/winning neurons and their outgoing 
connections are highhghted. At each step, at most one neuron per WITAS can become active and propagate 
activations through its outgoing connections. 

RNNs are general computers that allow for both sequential and parallel computations. Given enough 
neurons and an appropriate weight matrix, an RNN can compute any function computable by a standard 
PC 116 1 . We use a particular RNN named SLIM RNN |25 1 to define S for our experiment. Here we briefly 
review its basics. 

The fc-th computational unit or neuron of our SLIM RNN is denoted {0 < k < n{u) E N). w^^ is the 
real-valued weight on the directed connection c"^ from v} to u^. At discrete time step /; = 1, 2, . . . , t^nd of a 
finite interaction sequence with the environment, u^{t) denotes the real-valued activation of . There are 
designated neurons serving as online inputs, which read real-valued observations from the environment, 
and outputs whose activations encode actions in the environment, e.g., the movement commands for a 
robot. We initiahze all vf^{l) = and compute u''{t + 1) = /'' (X)/ w''''u\t)) where / may be of the form 
ff^i^x) = 1/(1 + e-^),or /'=(a;) = x,or f''{x) = 1 if x > and otherwise. To program the SLIM RNN 
means to set the weight matrix (w^^). 

A special feature of the SLIM RNN is that it has a single halt neuron with a fixed halt-threshold. 
If at any time t its activation exceeds the halt-threshold, the network's computation stops. Thus, any 
network topology in which there exists a path from the online or task inputs to the halt neuron can run 
self-delimiting programs IfTOl [3] [26j [2T| studied in the theory of Kolmogorov complexity and algorithmic 
probability 1271 181. Inspired by a previous architecture 1,15] , neurons other than the inputs and outputs in 
our RNN are arranged in winner-take-all subsets (WITAS) of riwitas neurons each {riwitas = 4 was used 
for this experiment). At each time step t, u'^{t) is set to 1 if is a winning neuron in some WITAS (the 
one with the highest activation), and to otherwise. This feature gives the SLIM RNN the potential to 
modularize itself, since neurons can act as gates to various self-determined regions of the network. By 
regulating the information flow, the network may use only a fraction of the weights {w"') for a given task. 
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Apart from the online input, output and halt neurons, a fixed number nti of neurons are set to be task 
inputs. These inputs remain constant for 1 < t < t^nd ™d serve as self-generated task specifications. 
Finally, there is a subset of ng internal state neurons whose activations are considered as the final outcome 
when the program halts. Thus a non-compression task is: Given a particular task input, interact with the 
environment (read online inputs, produce outputs) until the network halts and produces a particular internal 
state — the abstract goal — which is read from the internal state neurons. Since the SLIM RNN is a general 
computer, it can represent essentially arbitrary computable tasks in this way. Fig.[2]illustrates the network's 
activation spreading for a particular task. A more detailed discussion of SLIM RNNs and their efficient 
implementation can be found in the original report j25l. 

The SLIM RNN is trained on the fovea environment described in Sec. |4.2| using the PowerPlay 
framework according to Algorithm |3]below. The difference to Algorithm |2] lies in task set-specific details 
such as the encoding of task inputs and the definition of 'inventing and learning' a task. The bit string 
p now encodes a set of nu real numbers between and 1 which denote the constant task inputs for this 
program. Given a new set of task inputs, the new task is considered learned if the network halts and reaches 
a particular internal state (potentially after interacting with the environment), and remains able to properly 
reproduce the saved internal states for all previously learned tasks. This is implemented by first checking if 
the network can halt and produce an internal state on the newly generated task inputs. Only if the network 
cannot halt within a chosen fraction of the time budget dictated by length{p), the length of program p, 
the remaining budget is used for trying to learn the task using a simple mutation rule, by modifying a few 
weights of the network. When p is the single bit '0', the task is interpreted as a compression task. Here 
compression either means a reduction of the sum of squared weights without increasing the total number of 
connection usages by all previously learned tasks, or a reduction of the total number of connection usages 
on all previously learned tasks without increasing the sum of squared weights. 



Algorithm 3 PowerPlay implementation for experiment 2 
Initialize so in some way 
for j 1, 2, . . . do 
for m := 1, 2, . . . do 

for all candidate programs p s.t. length{p) < m do 
Set Ume_budget := 2'"'''="9"'(p) 
if p encodes a compression task then 

S&tStemp • — 

while time_budget > do 

Create Si from stemp through random perturbation of a few connection weights 
if compression is successful and time^budget > then 

Set Sfemp . — 

end if 
end while 
else 

while time_budget > do 

Create Si from Si_i through random perturbation of a few connection weights 
From p generate task k 

if Si-i does not solve k and Si solves k and Si solves all previous tasks in the repertoire and 
time^budget > then 

Add the pair (k, internal state) to the repertoire 
exit m loop 
end if 
end while 
end if 
end for 
end for 
end for 



Since our POWERPLAY variant methodically increases search time, half of which is used for compres- 
sion, it automatically encourages the network to invent novel tasks that do not require many changes of 
weights used by many previous tasks. 
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Our SLIM RNN implementation efficiently resets activations computed by the numerous unsuccessful 
tested candidate programs. We keep track of used connections and active (winner) neurons at each time 
step, to reset activations such that tracking/undoing effects of programs essentially does not cost more than 
their execution. 

4.2 RNN- Controlled Fovea Environment 

The environment for this experiment consists of a static image which is observed sequentially by the RNN 
through a fovea, whose movement it can control at each time step. The size of the fovea is 81 x 81 pixels; it 
produces 25 real valued online inputs (normalized to [0, 1]) by averaging the pixel intensities over regions 
of varying sizes such that it has higher resolution at the center and lower resolution in the periphery (Fig. 
[3]l. The fovea is controlled using 8 real-valued outputs of the network, and a parameter win-threshold. 
Out of the first four outputs, the one with the highest value greater than win-threshold is interpreted as a 
movement command: up, down, left, or right. If none of the first four outputs exceeds the threshold, the 
fovea does not move. Similarity, the next four outputs are interpreted as the fovea step size on the image 
(3, 9, 27 or 81 pixels in case of exceeding the threshold, 1 pixel otherwise). 

4.3 Results 

The network's internal states can be viewed as abstract summaries of its trajectories through the fovea 
environment and its parallel "internal thoughts." The system invents more and more novel skills, each 
breaking the generalization ability of its previous SLIM NN weight matrix, without forgetting previously 
learned skills. Within 8 hours on a standard PC, a SLIM RNN consisting of 20 WITAS, with 4 neurons in 
each WITAS, invented 67 novel action sequences guiding the fovea before halting. These varied in length, 
consuming up to 27 steps. Over time the SLIM NN not only invented new skills to solve novel tasks, but 
also learned to speed up solutions to previously learned tasks, as shown in Fig. |4] For clarity, all figures 
presented here depict aspects of this same run, though results were consistent over many different runs. 

The SLIM NN also learns to reduce the interactions with the environment. Fig. |5] shows the number 
of interactions required to solve certain previously learned fovea control tasks. Here an "interaction" is a 
SLIM NN computation step that produces at least one non-zero output neuron activation. General trend 
over different tasks and runs: the interactions decrease over time. That is, the SLIM NN essentially learns 
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Figure 4: For the first five self-invented non-compression tasks, we plot the number of connection usages 
per task. In this run, solutions to 340 self-generated tasks were learned. 67 of them were non-compression 
tasks (marked by small black lines at the top); the rest resulted in successful compressions of the SLIM 
RNN's weight matrix. Over time, previously learned skills tend to require less and less computational 
resources, i.e., the SLIM RNN-based solver learns to speed up its solutions to previous self-invented tasks. 
Although some plot lines occasionally go up, this is compensated for by a decrease of connection usages 
for dozens of other tasks (not shown here to prevent clutter). 



to build internal representations of its interaction with the environment, due to PowerPlay's continual 
built-in pressure to speed up and simplify and generalize. 

The SLIM NN often uses partially overlapping subsets of connection weights for generating different 
self-invented trajectories. Fig.|6]shows that not all connections are used for all tasks, and that the connec- 
tions used to solve individual tasks can become progressively more separated. In general, the variation in 
degree of separation depends on network parameters and environment. 

As expected, PowerPlay -based SLIM NNs prefer to modify only few connections per novel task. 
Randomly choosing one to fifteen weight modifications per task, on average only 2.9 weights were changed 
to invent a new skill — see Fig. [7] Why? Because PowerPlay is always going for the novel task that is 
fastest to find and validate, and fewer weight changes tend to affect fewer previously learned tasks; that is, 
less time is needed to re-validate performance on previous tasks. In this way PowerPlay avoids a naively 
expected slowdown linear in the number of tasks. Although the number of skills that must not be forgotten 
grows all the time, the search time for new skills does not at all have to grow in proportion to the number 
of previously solved tasks. 

As a consequence of its bias towards fast-to-validate solutions, the PowerPlay -based SLIM NN 
automatically self-modularizes. The SLIM RNN tested above had 1120 connections. Typically, 600 of 
them were used to solve a particular task, but on average less than three of them were changed. This 
means that for each newly invented task, the system re-uses a lot of previously acquired knowledge without 
modification. The truly novel aspects of the task and its solution often can be encoded within just a handful 
of bits. 

This type of self-modularization is more general than what can be found in traditional (non-inventive) 
modular reinforcement learning (RL) systems whose action sequences are chunked into macros to be re- 
used by higher-level macros, like in the options framework lf30l . or in hierarchical RL ll32l . Since the 
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Figure 5: For only six selected tasks (to prevent clutter), we plot the number of interactions with the 
environment, over a run where 67 novel non-compression tasks were learned, besides numerous additional 
compression tasks ignored here. Here an interaction is a SLIM NN computation step that produces at least 
one non-zero output neuron activation. The total number of interactions cannot exceed the number of steps 
until the halt neuron is activated. 

SLIM RNN is a general computer, and its weights are its program, subsets of the weights can be viewed 
as sub-programs, and new sub-programs can be formed from old ones in essentially arbitrary computable 
ways, like in general incremental program search ETI . 

5 Discussion and Outlook 

PowerPlay for SLIM RNN represents a greedy implementation of central aspects of the Formal Theory 
of Fun and Creativity ll22l |231 . The setup permits practically feasible, curious/creative agents that learn 
hierarchically and modularly, using general computational problem solving architectures. Each new task 
invention either breaks the solver's present generalization ability, or compresses the solver, or speeds it up. 

We can know precisely what is learned by PoWERPLAYing SLIM NN. The self-invented tasks are 
clearly defined by inputs and abstract internal outcomes / results. Human interpretation of the NN's weight 
changes, however, may be difficult, a bit like with a baby that generates new internal representations and 
skills or skill fragments during play. What is their "meaning" in the eyes of the parents, to whom the 
baby's internal state is a black box? For example, in case of the fovea tasks the learner invents certain input- 
dependent movements as well as abstractions of trajectories in the environment (limited by its vocabulary of 
internal states). The RNN weights at any stage encode the agent's present (possibly limited) understanding 
of the environment and what can be done in it. 

PowerPlay has no problems with noisy inputs from the environment. However, a noisy version of an 
old, previously solved task must be considered as a new task, because in general we do not know what is 
noise and what is not. But over time PowerPlay can automatically learn to generalize away the "noise," 
eventually finding a compact solver that solves all "noisy" instances seen so far. 

Our first experiments focused on developmental stages of purely creative systems, and did not involve 
any externally posed tasks yet. Future work will test the hypothesis that systems that have been running 
PowerPlay for a while will be faster at solving many user-provided tasks than systems without such 
purely explorative components. This hypothesis is inspired by babies who creatively seem to invent and 
learn many skills autonomously, which then helps them to learn additional teacher-defined external tasks. 
We intend to identify conditions under which such knowledge transfer can be expected. 
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Figure 6: Connection usage ratios for all SLIM RNN connections after learning 227 out of the 340 total 
self-invented tasks, 50 of them non-compression tasks forming the so-called task repertoire, the rest com- 
pression tasks. The usage ratio on the y-axis is the number of repertoire tasks using the connection, divided 
by the number of repertoire tasks. This ratio is 1 for the first 110 connections, which are frequently used 
outgoing connections from task and online inputs. The network learns to better utilize its own architecture 
by using different connections for different tasks, thus reducing the number of connections with high usage 
ratio. Such modularization can help to speed up task search in later stages. 

A Appendix: Implementation details 

The SLIM RNN used for Experiment 2 (fovea control) is constructed as follows; 

Let the number of input, output and state neurons in the network be nJnput, ruoutput and njstate, 
respectively. Let nbjoomp = number of computation blocks each with block jsize neurons. Thus there are 
nb-compx block jize computation neurons in the network. 

The network is wired as follows. Each task input neuron is connected to nbxomp computation neurons 
at random. Each online input neuron is connected to nb^comp/lO neurons at random. Each internal state 
neuron receives connections from nb-Comp/2 random computation neurons. The halt neuron recieves con- 
nections from nb-Comp/2 random computation neurons, nb-compxn -output random computation neurons 
are connected to random output neurons. Each neuron in each computation block is randomly connected 
to blocksize other computation neurons. 

We used nbxomp = 20, blocksize = 4, and nstate = 3 with ndnput = 25 and riMutput = 8 for the 
fovea control task. The halt-threshold was set to 3, and the WITAS and fovea control win-threshold?, were 
set to 0.00001. All connection weights were initiaUzed to random values in [—1, 1]. The cost of using a 
connection (consuming part of the time-budget) was set to 0.1 for all connections. The mutation rule was 
as follows. For non-compression tasks, the network is first run using the new task inputs to check if the task 
can already be solved by generalization. If not, we randomly generate an integer number m between 1 and 
l/50th of all connections used during the unsuccessful run, and randomly change m weights by adding to 
them a uniformly random number in [—0.5, 0.5]. For compression tasks, we randomly generate a number 
TO between 1 and l/50th of all connections used for any of the tasks in the current repertoire, and randomly 
modify to of those connections. 

The time budget fraction available to check whether a candidate task is not yet solvable by ,Si_i was 
chosen randomly between and timeJ}udgetl2. For compression tasks, the sum of squared weights had to 
decrease by at least a factor of 1/1000 to be acceptable. 
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Figure 7: For each self-invented non-compression task, we plot the number of modified SLIM NN weights 
needed to learn it without forgetting solutions to old tasks. During task search, the number of connec- 
tions to modify is chosen randomly. Once the growing repertoire has reached a significant size, however, 
successfully learned additional tasks tend to require few weight changes affecting few previous tasks (es- 
pecially tasks with computationally expensive solutions). This is due to PowerPlay's bias towards tasks 
that are fast to find and validate on the entire repertoire. See text for details. 
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